Towards Using Machine Translation Techniques to Induce Multilingual Lexica of Discourse Markers

نویسندگان

  • António Luís Vilarinho dos Santos Lopes
  • David Martins de Matos
  • Vera Cabarrão
  • Ricardo Ribeiro
  • Helena Moniz
  • Isabel Trancoso
  • Ana Isabel Mata
چکیده

Discourse markers are universal linguistic events subject to language variation. Although an extensive literature has already reported language specific traits of these events (e.g. [6,7,4,3,9]), little has been said on their cross-language behavior and, subsequently, on building an inventory of multilingual lexica of discourse markers. Thus, this work describes new methods and approaches for the description, classification, and annotation of discourse markers in the specific domain of the Europarl corpus. The study of discourse markers in the context of translation is crucial due to the idiomatic nature of these structures (e.g. [1,2]). Multilingual lexica together with the functional analysis of such structures are useful tools for the hard task of translating discourse markers into possible equivalents from one language to another.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards producing bilingual lexica from monolingual corpora

Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embed...

متن کامل

A Knowledge-Modeling Approach for Multilingual Regulus Lexica

Development of lexical resources is, along with grammar development, one of the main efforts when building multilingual NLP applications. In this paper, we present a tool-based approach for more efficient manual lexicon development for a spoken language translation system. The approach in particular addresses the common problems of multilingual lexica including the redundancy of encoded informa...

متن کامل

From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse

In the present paper, we analyse variation of discourse phenomena in two typologically different languages, i.e. in German and Czech. The novelty of our approach lies in the nature of the resources we are using. Advantage is taken of existing resources, which are, however, annotated on the basis of two different frameworks. We use an interoperable scheme unifying discourse phenomena in both fra...

متن کامل

A Bilingual Discourse Corpus and Its Applications

Existing discourse research only focuses on the monolingual languages and the inconsistency between languages limits the power of the discourse theory in multilingual applications such as machine translation. To address this issue, we design and build a bilingual discource corpus in which we are currently defining and annotating the bilingual elementary discourse units (BEDUs). The BEDUs are th...

متن کامل

Multilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation

Many discourse connectives can signal several types of relations between sentences. Their automatic disambiguation, i.e. the labeling of the correct sense of each occurrence, is important for discourse parsing, but could also be helpful to machine translation. We describe new approaches for improving the accuracy of manual annotation of three discourse connectives (two English, one French) by u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1503.09144  شماره 

صفحات  -

تاریخ انتشار 2015